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ABSTRACT 

We perform a generalized-ensemble simulation of a small peptide taking the inter- 
actions among all atoms into account. From this simulation we obtain thermodynamic 
quantities over a wide range of temperatures. In particular, we show that the folding of 
a small peptide is a multi-stage process associated with two characteristic temperatures, 
the collapse temperature and the folding temperature Tf. Our results give supporting 
evidence for the energy landscape picture and funnel concept. These ideas were previously 
developed in the context of studies of simplified protein models, and here for the first time 
checked in an all-atom Monte Carlo simulation. 
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It is well known that a large class of proteins fold spontaneously into globular states 
of unique shape, yet the mechanism of folding has remained elusive. The folding pro- 
cess may be either thermodynamically or kinetically controlled. The "thermodynamic 
hypothesis" assumes that the folded state corresponds to the global minimum in free en- 
ergy and is supported by the famous work of Anfinsen and similar experiments. On 
the other hand, Levinthal ||^ has argued that because of the huge number of local energy 
minima available to a protein, it is impossible to find the global free energy minimum 
by a random search in biological time scales (of order seconds). His argument rather 
suggests that the protein folds into a unique metastable state, the kinetically most ac- 
cessible structure. The complexity and importance of the problem raised a lot of interest 
in the subject over the last three decades, but no complete solution is in sight to date. 
However, significant new insight was gained over the last few years from the studies of 
minimal protein models. Both lattice models and continuum models |ll6|-||22|| 



have been extensively studied. Common to all these models is that they capture only few, 
but probably dominant interactions in real proteins. These include chain connectivity, 
excluded volume, hydrophobicity as the driving force, and sequence heterogeneity. For 



recent reviews on minimal protein models and their applications, see Refs. p3[-||26|. From 
the numerical and analytical studies of these models a new view of the folding process 
emerged. The folding kinetics is seen to be determined by an energy landscape which 
for foldable proteins resembles a funnel with a free energy gradient toward the native 



structure p, |T2], |T3|, |23|, The funnel is itself rough and folding occurs by a multi-stage, 
multi-pathway kinetics. A common scenario for folding may be that first the polypeptide 
chain collapses from a random coil to a compact state. This coil-to-globular transition 
is characterized by the collapse transition temperature Tq. In the second stage, a set 
of compact structures is explored. The final stage involves a transition from one of the 
many local minima in the set of compact structures to the native conformation. This final 
transition is characterized by the folding temperature Tf (< Tg). It was conjectured that 
the kinetic accessibility of the native conformation can be classified by the parameter il 



- = ^. (1) 

-l-e 

i.e., the smaller a is, the more easily a protein can fold. If Tq ^ Tf (i.e., a ~ 0), 



the second stage will be very short or not exist, and the protein will fold in an "all 
or nothing" transition from the extended coil to the native conformation without any 
detectable intermediates. On the other hand, for some proteins the folding process may 
involve further stages. A more elaborate classification of possible folding processes is 



discussed in Ref. IE3 



One can ask whether the picture outlined above is really useful to describe the folding 
kinetics of real proteins, because the underlying models are gross simplifications of real 
protein systems. For instance, side-chain conformational degrees of freedom that are 
important for packing are neglected. The situation actually resembles a vicious circle. 
The energy landscape picture and the analogy to phase transitions were developed from 
studies of the highly simplified description of proteins by minimal models. However, only 
if these concepts apply for proteins, it is possible to argue that the broad mechanism 
of phase transitions depends solely on gross features of the energy function, not on their 
details. Only in this case a law of corresponding states can be applied to explain dynamics 
of real proteins from studies of the folding kinetics in minimal models. It is therefore 
desirable to check the above picture by comparison with more realistic energy functions, 
namely, with all-atom simulations of a suitable protein. This is the purpose of the present 
article. While there has been an attempt to study the free energy landscape of an all-atom 
protein model by unfolding MD simulations [^, the present work starts from random 
initial conformations and is rather concerned with obtaining characteristic temperatures 
of protein folding by a Monte Carlo simulation (and thus studying the energy landscape 
indirectly) . 

Simulations of proteins where the interactions among all atoms are taken into account 
have been notoriously difficult (for a recent review, see Ref. I^^). The various competing 
interactions yield to a much rougher energy landscape than for minimal protein models. 
(In fact, one might question whether the limitations of the current energy functions may 
lead to rougher energy landscapes than the protein encounters in nature.) Simulations 
based on canonical Monte Carlo or molecular dynamics techniques will at low tempera- 
tures get trapped in one of the multitude of local minima separated by high energy bar- 
riers. Hence, only small parts of configuration space are sampled and physical quantities 
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cannot be calculated accurately. However, with the development of generalized- ensemble 
techniques like multicanonical algorithms |^ and simulated tempering |3^, an effi- 
cient sampling of low-energy configurations and calculation of accurate low-temperature 
thermodynamic quantities became feasible. The ffist application of one of these tech- 
niques to the protein folding problem can be found in Ref. |3^. Later applications of 
multicanonical algorithms include the study of the coil-globular transitions of a simplified 



model protein [|Tl[| and the helix-coil transitions of homo-oligomers of nonpolar amino acids 
p3|. A formulation of multicanonical algorithm for the molecular dynamics method was 



also developed [0, A numerical comparison of three different generalized-ensemble 
algorithms can be found in Ref. . 

The generalized-ensemble technique we utilize in this article was ffist introduced in 
Refs. ^ and is related to Tsallis generalized mechanics formalism In this 

algorithm, configurations are updated according to the following probability weight: 

where Eq is an estimator for the ground-state energy, np is the number of degrees of 
freedom of the system, and (3 = l/ksT is the inverse temperature with a low tempera- 
ture T (and ks is the Boltzmann constant). Obviously, the new weight reduces in the 
low-energy region to the canonical Boltzmann weight exp {-(3E) for SiEzM < 1. On 
the other hand, high-energy regions are no longer exponentially suppressed but only ac- 
cording to a power law, which enhances excursions to high-energy regions. In contrast to 
other generalized-ensemble techniques where the determination of weights is non-trivial, 
the weight of the new ensemble is explicitly given by Eq. (^. One only needs to find an 
estimator for the ground-state energy Eq which can be done by a procedure described 



in Ref. |^ and is much easier than the determination of weights for other generalized 
ensembles. Since the simulation by the present algorithm samples a large range of ener- 
gies, we can use the reweighting techniques to construct canonical distributions and 
calculate thermodynamic average of any physical quantity A over a wide temperature 
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range: 

/ dx A{x) w-\E{x)) e-^^(") 

<A>T = - — f , (3) 

dx w'\E{x)) e-^^(") 



where x stands for configurations. 

Here, we use these novel techniques to examine the picture for the folding kinetics as 
proposed from the simulations of minimal models. Limitations in available computational 
time force us to restrict ourselves on the simulation of small molecules, and we have 
in addition neglected explicit solvent interactions. The system of our choice is Met- 
enkephalin, one of the simplest peptides, with which we are familiar from earlier work 
p2| , |36| , |4T| . Met-enkephalin has the amino-acid sequence Tyr-Gly-Gly-Phe-Met. The 
potential energy function Etot (in kcal/mol) that we used is given by the sum of the 
electrostatic term E^s, 12-6 Lennard- Jones term E^j, and hydrogen-bond term E^i, for all 
pairs of atoms in the peptide together with the torsion term Etors for all torsion angles: 

Etot = Ees + Elj + Ehb + Etors, (4) 

E,. = (5) 

{id) ^^'^ 



\ ^12 ^6 ) 

\' ij ' ij / 



Elj = ^(^-^), (6) 

{id) V *i V / 
Etors = J2Uiii±cos{niXi)), (8) 

where rjj (in A) is the distance between the atoms i and j, and xi is the Z-th torsion an- 
gle. The parameters (g^, Aij, Bij, Cij, Dij, Ui and rii) for the energy function were adopted 
from ECEPP/2 g. The dielectric constant e was set equal to 2. In ECEPP/2 bond 
lengths and bond angles are fixed at experimental values. We further fix the peptide 
bond angles u to their common value 180°, which leaves us with 19 torsion angles (0, ip, 
and x) &s independent degrees of freedom (i.e., np = 19). The computer code KONF90 
p5| was used. We remark that KONF90 uses a different convention for the implemen- 
tation of the ECEPP parameters (for example, 0i of ECEPP/2 is equal to 0i — 180° of 
KONF90). Therefore, our energy values are slightly different from those of the original 
implementation of ECEPP/2. The simulation was started from a completely random 



initial conformation (Hot Start). One Monte Carlo sweep updates every torsion angle of 
the peptide once. 

It is known from our previous work that the ground-state conformation for Met- 
enkephalin has the KONF90 energy value Egs = —12.2 kcal/mol We therefore 

set Eo = -12.2 kcal/mol and T = 50 K (or, /? = 10.1 [ ^^J/^J ) (and np = 19) in 
our probability weight factor in Eq. @. The ground-state structure, exhibiting a 11'- 
type f3 turn, is shown in Fig. 1. It is a superposition of ball-and-stick and space- filling 
representations. The latter representation was added in order to give a rough idea of the 
volume of the peptide as discussed below. 

All thermodynamic quantities were then calculated from a single production run of 
1,000,000 MC sweeps which followed 10,000 sweeps for thermalization. At the end of every 
fourth sweep we stored the energies of the conformation, the corresponding volume, and 
the overlap of the conformation with the (known) ground state for further analyses. Here, 
we approximate the volume of the peptide by its solvent excluded volume (in A^) which is 
calculated by a variant of the double cubic lattice method [^. Our definition of the 
overlap, which measures how much a given conformation differs from the ground state, is 
given by 

0W-i-55i^|K"'-ar'l. (9) 

where af^ and af^^^ (in degrees) stand for the np dihedral angles of the conformation at 
t-th Monte Carlo sweep and the ground-state conformation, respectively. Symmetries for 
the side-chain angles were taken into account and the difference af ■* — ap'^'* was always 
projected into the interval [—180°, 180°]. Our definition guarantees that we have 

< <0>T < 1 , (10) 

with the limiting values 

r <o{t)>T ^1, T^o, 

I < 0{t) >T ^0 , T^oo . ^ ^ 



Let us now present our results. In Fig. 2a we show the "time series" of the total 
potential energy Etot- It is a random walk in potential energy space, which keeps the 
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simulation from getting trapped in a local minimum. It indeed visits the low-energy region 
several times in 1,000,000 Monte Carlo sweeps. The visits are separated by excursions into 
high-energy regions, which ensures de-correlation of the configurations. This can be seen 
in Figs. 2b and 2c, where time series of the excluded volume and the overlap function are 
displayed. The large changes in these quantities imply the large conformational changes 
which happen in the course of the simulation. Since large parts of the configuration 
space are sampled, the use of the reweighting techniques |^ is justified to calculate 



thermodynamic quantities over a wide range of temperatures by Eq. (|^). 

We expect the folding of proteins and peptides to occur in a multi-stage process. The 
first process is connected with a collapse of the extended coil structure into an ensemble 
of compact structures. This transition should be connected with a pronounced change in 
the average potential energy as a function of temperature. At the transition temperature 
we therefore expect a peak in the specific heat. Both quantities are shown in Fig. 3. 
We clearly observe a steep decrease in total potential energy around 300 K and the 
corresponding peak in the specific heat defined by 

^ _ 1 d{< Etot >t) _ ^2 < ^tot >T - < Etot >T 

^ = Wk^ dT N ' ^^^^ 

where N (= 5) is the number of amino-acid residues in the peptide. In Fig. 4 we display 

the average values of each of the component terms of the potential energy (defined in 

Eqs. (5)-(8)) as a function of temperature. As one can see in the Figure, the change 

in average potential energy is mainly caused by the Lennard- Jones term and therefore 

is connected to a decrease of the volume occupied by the peptide. This can be seen in 

Fig. 5, where we display the average volume as a function of temperature. As expected, 

the volume decreases rapidly in the same temperature range as the potential energy. The 

average volume is a natural measure of compactness, but the change from extended coil 

structures to compact structures with decreasing temperature can also be observed in 

other quantities like the average end-to-end distance < de-e >r (here, defined to be the 

distance between N of Tyr^ and O of Met^). In Table I, we give some of the values of 

< de-e >T as a function of temperature. The results imply again that the peptide is quite 

extended at high temperatures and compact at low temperatures. 

If both energy and volume decrease are correlated, then the transition temperature Tg 



can be located both from the position where the specific heat has its maximum and from 
the position of the maximum of 

^^^^ = (< FEtoi >T-<V >T< Etot >t) , (13) 

which is also displayed in Fig. 5. The second quantity measures the steepness of the 
decrease in volume in the same way as the specific heat measures the steepness of decrease 
of potential energy. To quantify its value we divided our time series in 4 bins corresponding 
to 250,000 sweeps each, determined the position of the maximum for both quantities in 
each bin and averaged over the bins. In this way we found a transition temperature 
Tg = 280 ± 20 K from the location of the peak in specific heat and = 310 ± 20 K 
from the maximum in d < V >t / dT. Both temperatures are indeed consistent with each 
other within the error bars. 

The second transition which should occur at a lower temperature Tf is that from a set 
of compact structures to the "native conformation" that is considered to be the ground 
state of the peptide. Since these compact conformations are expected to be all of similar 
volume and energy (systematic comparisons of such structures were tried in previous work 



| 48| , 1^, |50|), we do not expect to see this transition by pronounced changes in < Etot >t 
or to find another peak in the specific heat. Instead this transition should be characterized 
by a rapid change in the average overlap < O >t with the ground-state conformation 
(see Eq. (^) and a corresponding maximum in 

^^^^ ^ (< OEtoi >T-<0 >T< Etot >t) . (14) 
dl 

Both quantities are displayed in Fig. 6, and we indeed find the expected behavior. The 
change in the order parameter is clearly visible and occurs at a temperature lower than 
the first transition temperature Tg. We again try to determine its value by searching 
for the peak in d < O >t /dT in each of the 4 bins and averaging over the obtained 
values. In this way we find a transition temperature of Tf = 230 ± 30 K. We remark 
that the average overlap < O >t approaches its limiting value zero only very slowly as 
the temperature increases. This is because < O >t = corresponds to a completely 
random distribution of dihedral angles which is energetically highly unfavorable because 
of the steric hindrance of both main and side chains. 
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One characterization of the folding properties of a peptide or protein is given by the 
parameter a of Eq. (|l]). With our values for Tg and T/, we have for Met-enkephalin 
cr 0.2. Here, we have taken the central values: Tq = 295 K and Tf = 230 K. This value 
of a implies that our peptide has reasonably good folding properties according to Refs. 
and [|I^. We remark that the characterization of Met-enkephalin as a good folder has 
to be taken with care: Low-temperature simulations of the molecules with conventional 
methods are still a formidable task and a low value of a may not neccessarily indicate 
easy foldability in a computer simulation. 

While the collapse temperature Tg is roughly equal to room temperature, the tran- 
sition temperature Tf is well below room temperature. Consequently, contributions of 
ground-state conformers are not dominant at room temperature for Met-enkephalin, as 



was observed in our earlier work fS^, This is due to the small size of the peptide. 
However, it still can be regarded as a good model for a small protein, since it has a 
unique stable structure below Tf. It was shown in Refs. ||3^, ^ that Met-enkephalin 



remains mainly in the vicinity of the ground state without getting trapped in any of the 
local-minimum structures below Tf 230 K). 

We also performed a generalized-ensemble simulation with the same statistics for a 
second peptide, Leu-enkephalin (data not shown). We found: Tg = 300 ± 30 K and 
Tf = 220 ± 30 K. These transition temperatures are essentially the same as for Met- 
enkephalin. Both peptides are very similar, differing only in the side chains of the Met 
(Leu) residue. Our results indicate that indeed the general mechanism of the transition 
does not depend on these details and a law of corresponding state can be applied for 
similar peptides. 

Let us summarize our results. We have performed a generalized-ensemble simulation of 
a small peptide taking the interactions among all atoms into account and calculated ther- 
modynamic averages of physical quantities over a wide range of temperatures. We found 
for this peptide two characteristic temperatures. The higher temperature is associated 
with a collapse of the peptide from extended coils into more compact structures, whereas 
the second one indicates the transition between an ensemble of compact structures and 
a phase which is dominated by a single conformation, the ground state of the peptide. 
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Our results support pictures for the kinetics of protein folding which were developed from 
the study, both numerical and analytical, of simplified protein models. It is still an open 
question whether these minimal models can be used for predictions of protein conforma- 
tions. However, our analyses, performed with an energy function which takes much more 
of the details of a protein into account, demonstrate that these models are indeed able 
to describe the general mechanism of the folding process. Hence, the study of simplified 
models may in this way guide further simulations with more realistic energy functions. 
The present paper aims to be a first step in this direction. 
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Table I. Average end-to-end distance < dg-e >t (A) of Met-enkephalin as a function of 
temperature T (K). 
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Figure Legends 



FIG. 1: Ground-state conformation of Met-enkeplialin for KONF90 energy function. 
The figure was created witli Molscript and Raster 3D ||43 |. 



FIG. 2: Time series of total potential energy Efot (kcal/mol) (a), volume V (A^) 
(b), and overlap O (defined by Eq. (y)) (c) as obtained by a generalized-ensemble 
simulation of 1,000,000 Monte Carlo sweeps. 

FIG. 3: Average total potential energy < Efot >t and specific heat C as a function 
of temperature. The dotted vertical hue is added to aid the eyes in locating the peak 
of specific heat. The results were obtained from a generalized-ensemble simulation 
of 1,000,000 Monte Carlo sweeps. 

FIG. 4: Average potential energies as a function of temperature. The results were 
obtained from a generalized-ensemble simulation of 1,000,000 Monte Carlo sweeps. 

FIG. 5: Average volume < V >t and its derivative d < V >t /dT as a function of 
temperature. The dotted vertical line is added to aid the eyes in locating the peak 
of the derivative of volume. The results were obtained from a generalized-ensemble 
simulation of 1,000,000 Monte Carlo sweeps. 

FIG. 6: Average overlap < O >t and its derivative d < O >t /dT as a function of 
temperature. The dotted vertical line is added to aid the eyes in locating the peak 
of the derivative of overlap. The results were obtained from a generalized-ensemble 
simulation of 1,000,000 Monte Carlo sweeps. 
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